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ABSTRACT 

We seek to improve estimates of the power spectrum covariance matrix from a limited num- 
ber of simulations by employing a novel statistical technique known as shrinkage estimation. 
The shrinkage technique optimally combines an empirical estimate of the covariance with a 
model (the target) to minimize the total mean squared error compared to the true underlying 
covariance. We test this technique on N-body simulations and evaluate its performance by es- 
timating cosmological parameters. Using a simple diagonal target, we show that the shrinkage 
estimator significantly outperforms both the empirical covariance and the target individually 
when using a small number of simulations. We find that reducing noise in the covariance es- 
timate is essential for properly estimating the values of cosmological parameters as well as 
their confidence intervals. We extend our method to the jackknife covariance estimator and 
again find significant improvement, though simulations give better results. Even for thousands 
of simulations we still find evidence that our method improves estimation of the covariance 
matrix. Because our method is simple, requires negligible additional numerical effort, and 
produces superior results, we always advocate shrinkage estimation for the covariance of the 
power spectrum and other large-scale structure measurements when purely theoretical mod- 
eling of the covariance is insufficient. 

Key words: methods: statistical - large-scale structure of the Universe. 



o 



X 



1 INTRODUCTION 

Large-scale structure statistics, especially power spectra, provide 
precise constraints on cosmological theories. Accurate measure- 
ments are now possible with large-volume surveys and advancing 
computational power. However, the measured power spectrum is 
not the only required ingredient for estimating cosmological pa- 
rameters; the covariance matrix also carries a great deal of infor- 
mation that is vital for properly estimating parameter values and 
their confidence intervals. Observational effects such as the sur- 
vey geometry, redshift-space distortions, and non-linear clustering 
make theoretical modeling of the covariance matrix difficult, and 
often simulations are used to study them in detail. Covariance ma- 
trices estimated from a finite number of simulations will contain 
noise. Cosmological parameter estimation requires the inverse of 
the covariance matrix to properly weight the measurements. Matrix 
inversion is an inherently no n-linear operation that i s sensitive to 
the noise of all the elements. IChen & Szapudil ( l2006h showed that 
when the off-diagonal elements of a covariance matrix are exces- 
sively noisy it is better for parameter estimation to use a diagonal 
approximation of the covariance. This reduces the effects of noise, 
but ignores important information in the covariance. 

Covariance matrices for large-scale structure measurements 
are often estimated using the unbiased empirical covariance ma- 
trix, S (see equation[8l(, a close relative of the maximum-likelihood 
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estimator, S'*^^' = ^^S. These estimators work well in the 

n 

regime where the number of repeat observations, n, is much greater 
than the number of parameters measured for each observation, p. 
However, in the regimes where n ^ p ov n <^ p the covari- 
ance matrix estimates become ill-conditioned and unstable during 
inversion, which is necessary for optimal weighting of the data. 
This is an indication that these estimators do not produce good 
approxima tions of the true underlying covariance matrix in these 
regimes. Ef^oi? ('1982') provides some insight into the difference 
between maximum-likelihood as a summarizer and as an estima- 
tor. Maximum-likelihood is an excellent summarizer of data in 
the sense of trying to represent the important statistical informa- 
tion about a dataset in a small set of numbers. Though maximum- 
likelihood is asymptotically optimal for estimation in the limit of 
infinite data, the use of this summary of information for the pur- 
pose of making estimates w ith a finite set of data is not always 
the best option. ISteinI ( Il956l) proved that one can construct esti- 
mators in high-dimensional (d ^ 3) inference problems that out- 
perform maximum-likelihood estimators in the sense of minimiz- 
ing the total mean squared error. Maximum-likelihood produces 
the best estimates of individual parameters, but the alternatives 
can often reduce the error on many of the parameters while only 
slightly increasingthe error on a few, resulting in an overall im- 
provement. ISteinI ( 11956 ) also showed that the maximum likelihood 
estimator has the best performance among estimators that trans- 
form correctly under translation, implying that any estimator that 
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outperforms maximum-likelihood will necessarily involve an arbi- 
trary choice; 

'Sc hafer & StrimmeJ feOOSh employ a method known as 
shrinkage estimation to construct covariance matrices for func- 
tional genomics measurements in the n <gi p regime. Their tech- 
nique optimally combines a high-dimensional estimate that has lit- 
tle or no bias with a low-dimensional estimate that may be biased 
but has much less variance. The result minimizes the total mean 
squared error, which is the sum of bias (squared) and variance. 
They argue that their method can also perform some amount of 
regularization, resulting in a covariance matrix that has a full set of 
positive-definite eigenvalues and is well-conditioned (i.e., the ratio 
of the largest to smallest eigenvalue is not so large that invers i on be- 
comes unstable). They employ a lemma from lLedoit & Wol3 ^20031) 
to analytically calculate the optimal linear combination of the low 
and high dimensional estimates. 

In this paper our goal is to provide a simple recipe for us- 
ing shrinkage estimation to improve the covariance matrix of the 
matter power spectrum from a limited number of simulations over 
the ubiquitous sample covariance estimator. Our method aims to 
reduce the total noise while retaining as much information about 
real covariance in the simulations as possible. Shrinkage estimation 
achieves this by optimally combining a theoretical model with the 
empirical estimate. We will assess the improvements our method 
offers by examining the performance of the covariance matrices 
through inversion and use in cosmological parameter estimation. 
Although we focus on the matter power spectrum, the shrinkage 
technique is relevant for many studies in large-scale structure and 
cosmology. 

The outline of this paper is as follows: in Section |2] we intro- 
duce shrinkage estimation and describe its application to covari- 
ance matrices. Section[3]applies the shrinkage technique to several 
toy problems before moving to a more complicated example in- 
volving galaxy clustering. We describe our technique for measuring 
matter power spectra from N-body simulations in section|4] In Sec- 
tion|5]we construct several estimates of the power spectrum covari- 
ance matrix and compare their performance by estimating cosmo- 
logical parameters. Finally we review our results, make recommen- 
dations, and discuss future directions of this project in Section[6] 



2 SHRINKAGE ESTIMATION 
2.1 The Method 

Much of this sec tion summarizes the introdu ction to shrinkage es- 
timation given in lSchafer & Strimmerl ^20051) . Suppose we are esti- 
mating a vector ip (of length p) from a set of n measurements using 
two different models. One of the models has many free parameters 
and produces an estimate, u, with little (or no) bias, but the vari- 
ance may be significant due to the number of free parameters. The 
second model (called the target) has many fewer (or no) free param- 
eters and produces an estimate, t, which will have smaller variance 
but may be biased. We construct a new estimate, u* , from a linear 
combination of these two models, given by 



At + (1 - \)u 



(1) 



where A G [0, 1] is called the shrinkage intensity. The question 
now becomes how to choose A in an optimal way. A common way 
to optimize an estimator is to minimize the expected mean squared 
error, given by the risk function 



V 



(2) 



where the angle b rackets indicate the expectation value. 
iLedoit & Wolj ( 2003h introduced an analytic solution for the op- 
timal shrinkage intensity. A*. Prior to this solution shrinkage esti- 
mation was much less practical because numerically complicated 
and expensive methods were necessary to find the optimal shrink- 
age intensity. The analytic solution is 

Var('Ui) — Cov(ti,iti) — B'ms{ui){ti — m) 

where Var, Gov, and Bias are the true varia nce, covariance, and 
bias, respectively. For a practical estimator, ISchafer & Strimrnerl 
12005) suggest estimating A* as 



A* = 



Var(ui) — Cov(fi,iti) — Bias(ui)(ti — m) 



(4) 



where Var, Gov, and Bias are the unbiased sample estimates of 
Var, Gov, and Bias, respectively. The Bias term can be ignored if 
u is an unbiased estimator. 



2.2 Application to Covariance Matrices 

We now specialize the shrinkage estimation technique to covari- 
ance matrices. Suppose we have n sets of data and we measure a 
data vector x of length p for each of them. Let x\''^ be the fc*'' (of n 
total) observation of the i"^ (of p total) element of the data vector 



The estimated empirical mean is given by Xi = i y^"_-| 
define 



(k) 



We 

(5) 
(6) 



and write the unbiased empirically estimated covariance matrix of 
the data, S, as 

n 



ij — Gov(xi, Xj 



-Wi 



n - 1 " 
Explicit substitution results in the usual 



Si' 



fc=l 



1 ^ 



(7) 



(8) 



Similarly we can estimate the covariance of the elements of the 
covariance matrix of the data, given as 



Gov(Sij, 5im) 



(n- 1)3 ^ 



with the variance of an individual entry given by Var(S'ij) = 
Cov(S'i-, , Sij). For shrinkage estimation we let S take the role of 
u and supply a target covariance matrix T to take the role of t. The 
optimal shrinkage intensity and resulting covariance matrix C are 
given by 



A* 



^,^Var(S.,)-Gov(T,,- 
A*T+ (1 - A*)S. 



(10) 
(11) 
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Table 1. Shrinkage estimation of the mean of a noisy vector. 
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(MSE(it*)) 
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(MSE(ii)) 
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(MSE(t)> 


MSE(t) 
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1.0 
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100 
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0.003 
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1.02 
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0.0002 
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0.45 


0.46 
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1.00 
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0.20 


0.201 


0.0003 


0.80 


0.80 


1.00 


1.00 


4.00 


4.00 








50 




0.67 


0.696 


0.008 


0.67 


0.71 


1.00 


2.00 


1.00 


1.00 










50 


0.50 


0.529 


0.01 


0.25 


0.26 


0.50 


0.49 


0.50 


0.50 



Results of simulations to test shiinkage estimation of the mean of a noisy vector. The first five columns are the input values for the simulations. The first row 
gives the fiducial values and subsequent rows only indicate parameters that were varied. The remaining columns list analytically predicted (indicated by ()) 
and measured (where a and Aa indicate the sample mean and standard deviation for a) quantities from the outputs of the simulations. We used 100 simulations 
for each set of input parameters. See Section lxTl for an explanation of the parameters and quantities. 



If the A* estimate is greater than one, then A* = 1 is enforced, 
implying that only the target matrix is used. If the A* estimate is 
less than zero, then A* = is enforced, implying that only the 
empirical covariance matrix is used. The numerator of equation [TO] 
implies that as the variances of the elements of S decrease (e.g., ap- 
proaching the p regime) the shrinkage estimate smoothly ap- 
proaches the empirical covariance. The denominator in eguationfTOl 
also ensures that if the chosen target matrix is very different from 
the empirical covariance then the estimator will tend to the empir- 
ical covariance. Thus an inappropriate choice of target should not 
make the final results any worse, but there will be little gain in effi- 
ciency. 

The Coy {Tij, Sij) term in the numerator of equation [TO] ac- 
counts for the fact that S and T are estimated from the same data. 
If T is fixed then the term is zero. If some of the elements of T are 
taken directly from S then the Gov term exactly cancels the Var 
te rm and those elements do n ot affect the estimate of A*. Tab. 2 
of lSchafer & StrimmeJ ( l2005l) shows a variety of worked examples 
for common targets that may or may not depend on the empirically 
estimated covariance. In their examples they ignore moments of 
higher order than Var(S'ij). 



3 TOY EXAMPLES 

Before tackling large-scale structure measurements we present the 
application of shrinkage estimation to toy models. In Section IbTI 
we employ shrinkage to estimate the mean of a noisy vector. For 
that example we can calculate the expected shrinkage intensity an- 
alytically. In Section IT2I we present a toy example of covariance 
estimation. 



3.1 Estimating the Mean of a Noisy Vector 

In our first toy example of the shrinkage technique we will estimate 
the mean of a vector from a set of noisy realizations using the for- 
malism from Section im Given an input mean vector, i/i, the distri- 
bution for an element of the noisy realization is Xi = N(ipi, af), a 
normal distribution with mean tpi and variance a^. The i*** element 
(of p total) of the fc**^ realization (of n total) is written x^''\ and 
our high-dimensional estimate of the mean is given by 



n ■ 



(12) 



Given a fixed target, t, we will calculate the expected value for 
the optimal shrinkage intensity, (A*), as given by equation[3] If we 
assume the numerator and denominator are uncorrelated we can 
write 



(A*) = 



ELiVar(^0 



From our definitions we know that 



(13) 

(14) 
(15) 



which we can use to find the expectation values of our estimators 



{U^) = 




,2,1 2 



(16) 



(17) 



Fixing the values for 1/;^, ai, and ti to be ?/', cr, and t, respectively, 
we predict the shrinkage intensity to be 

(A*) ~ a - (18) 

n(t — ip)^ + (T^ 

This formula displays the behavior we expect. As n becomes large, 
(A*) approaches zero and we will rely mostly on the empirical esti- 
mate of the mean. As t approaches tp, (A*) approaches unity as we 
have chosen the perfect target. We can also calculate the expected 
mean squared errors for the high-dimensional, low-dimensional, 
and shrinkage estimators and find they are given by 



(MSE(u)) 



P — , 



{MSE(t)) = p{^-tf 



{MSE(u*)) 



(19) 
(20) 

(21) 



respectively. 

We generated simulations to check the validity of our analytic 
predictions. Each simulation starts by generating the n realizations 
cc*^*'. The sample mean u is calculated for the set of realizations 
and used with the target t to generate the shrinkage estimate u* . 
The shrinkage intensity. A*, for the set of realizations is calculated 
using equation |4] and jackknife resampling of equation [12] is used 
to estimate the Var(iti) term. We generated 100 simulations for 
each set of input parameters we tested and the results are shown in 
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Figure 1. Comparison of the Monte Carlo (MC) and shrinkage (MC+S) estimates for the toy model covariance from Section [X2l The plot at left shows the 
mean squared eiTor (MSB) between the estimate and the known input covariance as a function of the number of realizations, n. Results using only the target are 
also shown. The plots at right show the eigenvalue spectra for the covaiiance estimators using different numbers of realizations, n. The known true eigenvalue 
spectrum and the eigenvalues of the target are also shown. All plots represent the averages of 100 simulations for each number of realizations, n. 



Table[T] The results of our simulations match the analytical predic- 
tions reasonably well and also show that the shrinkage estimator 
outperforms both the empirical estimate and the target in a mean 
squared error sense. 



3.2 Estimating the Covariance 

Our second toy model applies the formalism of Section 12.21 for 
shrinkage estimation of a covariance matrix. In this case the an- 
alytical predictions would be prohibitively tedious, so we present 
only the results of simulations. Each simulation starts by generat- 
ing the realizations, a;'''' , where the distribution of each element is 
X['^^ = N{0, a^). From each set of realizations we construct two 
estimates of the covariance matrix. The Monte Carlo (MC) method 
uses the standard empirical covariance estimate as given by equa- 
tion [8] We also compute a shrinkage version of the Monte Carlo 
estimator (MC+S) from equation [TT] using the identity matrix as a 
target. The two estimators are compared by inspecting the eigen- 
value spectra and by computing the mean squared error between 
the estimate and the known true covariance matrix, which is 
where I is the identity matrix. The mean squared error between two 
matrices A and B is given by the Frobenius norm of the difference, 



(22) 



For our simulations we fixed p = 18 and a = 1.1 and 
compared the performance of the covariance estimators as a func- 
tion of the number of realizations, n. We ran 100 simulations for 
each value of n and averaged the results. For n = 40, 400, 4000 
the shrinkage intensities were A* = 0.92 ± 0.08, 0.61 ± 0.06, 
0.136 ± 0.006. Figure [T] shows the results for the two estimators 
as well as the results when only using the target. The shrinkage 
estimator gives a mean squared error that is equal to or less than 
both the empirical covariance and the target by itself for all n. The 
eigenvalue spectrum for the shrinkage estimator is also closer to 
the correct spectrum than for the empirical covariance at fixed n. 
This indicates that using the shrinkage estimator is in some sense 
equivalent to using more simulations. 



4 SIMULATIONS AND MEASUREMENTS 

We divide the Hubble Volume l lEvrard et al.l |2002|) ACDM 
simulation into 4096 sub-volumes, each with a sidelength of 



187.5 Mpc/h. We measure the power spectrum in each sub- 
volume using a simple code that imp l ement s the same basic al- 
gorithm as presented in I Szapudi et alj j2005l) and employs a Fast 
Fourier Transform (EFT). The EFT was performed on a 64'^ grid 
and the power spectrum was measured in 1 8 logarithmically-spaced 
bins from k — 0.0367 to A: = 0.920. For plotting and model 
comparison purposes we calculate the average k from the actual 
modes in each bin for the bin centers. The power spectrum cal- 
culated on a pixelized grid is the product of the true power spec- 
trum with (the square of) the Fourier transform of the pixel win- 
dow function. Our measured power spectrum can be written as 
f'mcas(fc) = P{k)/\Wp{k)\'^ where the Fourier transform of the 
pixel window function, Wp, is given by 



fik) = 



i(7rfc/(27r/L)) 



7rfc/(27r/L) ' 
\Wp(k)f = P{k,)PiK)p{k,), 



(23) 
(24) 



and L is the length of the side of apixel. Our pixels are 2.93 Mpc/h 
on each side. 

We used CAMB jLewis et alj l2000t) to calculate a model 
transfer function and matter power spectrum usin g cosmologi- 
cal p arameter values to match the Hubble Volume lEvrard et al.l 
l2002h ACDM simulation: Q,n = 0.3, Qa = 0.7, h = 
Ho/100 kms-^Mpc"^ = 0.7, fit/i^ = 0.0196, and n, = 1. 
We normalized the resulting power spectrum, PcAMB(fc), to have 
a (linear) as = 0.9 and then applied the non-linear halofit 
JSmith et"al]|2003b correction. Thus the normalization matches lin- 
ear theory for large scales (low k), but the shape includes non-linear 
clustering corrections at smaller scales (higher k). 

The measured power spectrum is the convolution of the true 
power spectrum with the squared Fourier transform of the survey 
window function. When testing a model for the power spectrum we 
must perform this convolution before comparing it to our measure- 
ment. The convolved power spectrum, , is given by 



P^{k) = / dq P{k - q)\Ws{q)\' 



(25) 



where P is the input (theoretical) power spectrum. Because the sur- 
vey is the same shape as our pixels we can write the (normalized) 
survey window function in terms of the pixel window function as 
\Ws\'^ = iL/2TTf\Wp\^ where L is now the survey sidelength. To 
simplify the convolution we used the spherically averaged trans- 
form of the window function (calculated via Monte Carlo) and re- 




k (h/Mpc) 

k (h/Mpc) 



Figure 2. Plot of the input theoretical power spectrum (Pcamb), the 
theoretical power spectrum convolved with the survey window function 
(^CAMb)' averaged power spectrum measured from all of the 

sub-volumes ((Pmoas))- Inset shows the spherically averaged survey win- 
dow function. 
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Table 2. Statistics of o-g and eiTor bar estimates. 
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(-^8) 




(A) 




Reference 


0.870 


0.041 


0.042 


0.002 


Monte Carlo 


0.853 


0.088 


0.031 


0.006 


Monte Carlo Target Only 


0.870 


0.042 


0.014 


0.002 


Monte Carlo -I- Shrinkage 


0.872 


0.042 


0.027 


0.008 


Jackknife 


0.790 


0.102 


0.015 


0.005 


Jackknife Target Only 


0.869 


0.044 


0.013 


0.003 


Jackknife -i- Shrinkage 


0.850 


0.047 


0.021 


0.007 



The mean and standard deviation of the estimates of the maximum- 
likelihood estimate, ag, and the one-sigma error bar. A, using different 
methods to estimate the covariance matrix. 

I2OOOI) would be more accurate. This could cause some bias in our 
recovered value of ag, but it should not affect our assessment of the 
relative performance of different covariance estimators. The covari- 
ance matrices for the normal and offset lognormal cases are related 
by a change of variables. 



duced the convolution of the spherically symmetric power spectrum 
to a two-dimensional integral, given by 

P^(k)^2-K I dqq'\W,{q)\' I dxP{^/k'^+q^ - 2kqx).(26) 
Jo J~i 

In practice the integral over dx was done with Romberg integra- 
tion and the integration over dq was performed with the extended 
trapezoidal rule at the k values where we had calculated the spheri- 
cally averaged | Ws (fc) P • Power spectrum values inside the integral 
were evaluated using cubic spline interpolation in (log k, log P) 
and power law extrapolation at low and high k. Fig. |2] shows the 
input theoretical power spectrum (Pcamb), the convolved model 
power spectrum (Pqamb) > snd the average of the measured power 
spectra from all of the sub-volumes ((Pmcas))- The inset shows the 
spherically averaged survey window function. 

We wish to evaluate our covariance estimates in the context of 
cosmological parameter fitting. Given the relatively small volume 
of each of our sub-volumes we decided to fit for only the power 
spectrum normalization in each sub-volume, parameterized by ug 
(linear). We write the log-likelihood (for a fixed covariance matrix) 
as 

d{as) = Pmcsik) - (^^y P^^^^{k,as = 0.9), (27) 

log/:(ag)cx-id^C-^d = -ix' (28) 

where C is the covariance matrix being tested. We use erg to fit 
for the amplitude of the power spectrum in the linear regime, and 
we assume that the difference in shape in the non-linear regime is 
minimal for values of erg that are close to our fiducial model of 
(jg — 0.9. We numerically find the maximum likelihood value, as, 
and define the upper and lower one sigma error bars, ±A, where 

log£(ag ± A) - log£(ag) = -1/2. 

Our likelihood analysis assumes that the bandpower measure- 
ments of the power spectrum are normally distributed. For most of 
our bandpowers this is a good approximation , but there are several 
for which an offset lognormal distribution jSond, Jaffe. & Knoxl 



5 COVARIANCE MATRIX ESTIMATES 

In this section we compare the performance of several techniques 
for estimating the covariance matrix of our power spectrum mea- 
surements. As a baseline comparison we calculate the covariance 
matrix from all 4096 sub-volumes using equation [8] For each 
method of covariance matrix estimate we measure as and A us- 
ing the power spectra measured from the sub-volumes and show 
the distributions of the trg and A values. The mean and standard 
deviations of those quantities are presented in Table |2] 



5.1 Reference 

As a reference we estimate the covariance matrix of our power 
spectrum measurement by applying equation [8] to our measure- 
ments from all 4096 sub-volumes. There are 18 x 19/2 = 171 
independent elements in the covariance matrix, thus we are in the 
regime where we have many more realizations than elements to be 
estimated and the usual covariance estimator should give reason- 
able results. The solid black line in Fig.[3]is a histogram of the re- 
sults of estimating as using the reference covariance matrix and the 
power spectra measured from each of the 4096 sub- volumes. The 
upper panel shows the distribution of as and the lower panel shows 
the distribution of the error bar estimates (absolute value of both 
upper and lower). The mean and standard deviation of these his- 
tograms is presented in Table |2] The agreement between the width 
of the best-fit distribution, a^^, and the mean error bar estimate, 
(A), indicates that the covariance matrix is properly estimating the 
likelihood distribution. The width of the error bar distribution, a a, 
is small, indicating that the error bar estimate is usually very close 
to the correct value. The mean of the maximum-likelihood esti- 
mates, (as), is 0.870 instead of our known input value of 0.9, but 
we know that our modeling of the power spectrum into the non- 
linear regime is not perfect so this small offset is not worrisome for 
our purposes. For the remainder of this section we assume that the 
results using the reference covariance matrix are a good approxi- 
mation to those that would be obtained using the true underlying 
covariance matrix. See Section|6]for further discussions. 
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Figure 3. (Area normalized) distributions of maximum-likelihood value, 
CTg , and error bar, A, estimates for the Reference, Monte Carlo (MC), Monte 
Carlo Target only (Target), and the Monte Carlo + Shrinkage (MC+S) co- 
variance matrix estimates. 



5.2 Monte Carlo 

Next we test covariance matrices estimated with equation [8]but us- 
ing a small number of sub-volumes, which we call the Monte Carlo 
method. We use sets of 40 (randomly chosen, non-overlapping) 
sub-volumes to test the regime where we have more simulations 
than diagonal elements of the covariance matrix (18), but fewer 
simulations than independent elements (171). From 4096 sub- 
volumes we can create 102 separate covariance matrix estimates. 
To obtain smooth histograms in Fig.|3]we test each covariance ma- 
trix estimate against 40 randomly chosen P{k) measurements from 
other sub- volumes. The statistics in Table[2]are calculated using one 
randomly chosen P{k) measurement per covariance matrix. The 
statistics do not depend on how many randomly chosen P{k) mea- 
surements are used for each covariance matrix estimate. The upper 
panel of Fig.[3]shows that the distribution of trg is too wide, indicat- 
ing that a parameter analysis using a covariance matrix estimated 
with this method will often return a value far from the mean. The 
lower panel of Fig.|3]shows that the error bar is typically underesti- 
mated by ~ 25%. The width of the estimated error bar distribution 
is also much wider than for the reference covariance matrix esti- 
mate. These effects are the result of using a very noisy estimate of 
the covariance matrix. 
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Figure 4. Plot of the diagonal elements of the reference covariance matrix 
estimated from all of the sub-volumes, of a linear theory model for the co- 
variance, and of the 102 target matrices for the Monte Carlo + Shrinkage 
estimates. Inset shows the reference correlation matrix in a linear stretch. 



5.3 Monte Carlo -i- Shrinkage 

Our first test of the shrinkage approach is to apply shrinkage esti- 
mation to the Monte Carlo method described in the previous sec- 
tion. First we need to choose a target covariance matrix, T. In linear 
theory we expect the covariance matrix of the power spectrum to be 
diagonal. Off-diagonal terms arise in practice from the survey win- 
dow function and non-linear clustering effects. We use a diagonal 
target matrix to simulate a situation where we have some idea about 
the structure of the covariance matrix but we know our model is not 
exact. Our target matrix takes the form 





Sii 



{ki) ^ 0.14h/Mpc 
{h) > 0.14h/Mpc 



(29) 



where we use a different method in the linear and non-linear 
regimes. For bins in the linear regime we use our convolved 
model for the power spectrum, PcjMsi^i)^ ''id the number 
of k modes in each bin, Nj, to pre dict the covariance (e.g., 
iHamilton. Rimes. & Scoccimarrdl2006l) . In the non-linear regime 
we use the diagonal of the empirically estimated covariance from 
the 40 sub-volumes. Fig. |4] shows the diagonal elements of the 
reference covariance matrix, the linear theory model, and the 102 
target matrices. Inset is the reference correlation matrix, Rtj = 
CijI -y/ Cii Cjj, showing that the covariance matrix is strongly di- 
agonal until well into the non-linear regime. Our results are robust 
to changes in the non-linear cutoff by several bins in either direc- 
tion. 

We calculate the optimal shrinkage intensity. A*, for each of 
the 102 Monte Carlo estimates, S, according to equation [To] We 
apply the Cov {Tij ,Sij) term to the diagonal elements of T that are 
taken from S. We find values for A* distributed evenly between 0.1 
and 1.0 (see Fig.|7j. We produce each of our 102 new estimates of 
the covariance matrices, C, from a linear combination of S and T 
according to equation [TT] We perform the same tests as described 
in section 15.21 and compare the results in Fig. [3] and Table |2] The 
most striking result is that the maximum-likelihood estimates, as, 
follow a very similar distribution to that for the reference matrix, 
indicating that the parameter values are now correctly estimated. 
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Figure 5. (Area normalized) distributions of maximum-likelihood value, 
(Tg, and en'or bar, A, estimates for the Reference, Jackknife (JK), Jackknife 
Target only (Target), and the Jackknife + Shrinkage (JK+S) covaiiance ma- 
trix estimates. 



5.4 Jackknife 

Recently a resampling technique know as the jackknife method has 
been used to estimate covariance matrices for large-scale structure 
measurements from the data set itself. The method works by di- 
viding the data volume into n cells of roughly the same size and 
recalculating the measurement n times, each time with a different 
cell left out. The variance between the measurements can be ad- 
justed to try and calculate the variance corresponding to the entire 
volume. In practice one replaces equation[5]with 

^«^(n_l)^^^(.)__^^^(.)__^ (30) 

and then calculates the covariance matrix with equation|7] resulting 
in the usual 

n 

n ^ — ^ 

fc=i 

We divided each sub-volume into 3"^ — 27 cells and modified our 
code to calculate the power spectrum with one cell removed. Our 
code incorporates a volume correction, the lowest order edge cor- 
rection in Fourier space. For each of the 4096 sub- volumes we esti- 
mate (Tg and A using the power spectrum and the jackknife covari- 
ance matrix from the same sub-volume. The results are compared 
to the reference case in Fig.[5]and listed in Table|2] 

The distribution of ag is much wider than for the reference 
covariance matrix, indicating that the noise in the covariance esti- 
mate causes incorrect parameter estimation. This is similar to the 
result for the Monte Carlo method. The jackknife estimates of trg 
also peak at a noticeably lower value than for the reference covari- 
ance, though the two histograms are in roughly lo agreement given 
the width of the distribution for the jackknife case. The error bars 
estimated in the jackknife case are typically underestimated by a 
factor of almost three compared to the reference covariance matrix 
and nearly an order of magnitude compared to the actual width of 
the jackknife distribution of ag. 



The error bars are still underestimated, but the distribution is very 
similar to that for the normal Monte Carlo estimator. 

Fig. [3] and Table |2] also show results using only the diagonal 
target. The values of ag follow the correct distribution, indicating 
that the estimated parameter values are fine, but the error bars are 
much more severely underestimated. This is expected because our 
target matrix is diagonal and we are using information from far 
enough into the non-linear regime to know that we are missing 
some important covariance. It is now clear that the estimated error 
bar distribution of the shrinkage estimator is a combination of the 
Monte Carlo and target distributions. The shrinkage intensity can 
serve as a proxy for whether the estimated error bars are likely to 
be similar to those for the Monte Carlo or the target. See Section|6] 
for further discussions. 

In summary, the shrinkage of the empirically estimated covari- 
ance against our target matrix outperforms either matrix by itself. 
Using just the empirically estimated covariance brings in too much 
noise which causes error in the estimation of ag itself. Using only 
the diagonal target mitigates the noise problems, but ignores im- 
portant covariance. The shrinkage estimator uses the best aspects 
of both, keeping the part of the covariance that is well estimated 
but drastically reducing the total amount of noise. 



5.5 Jackknife -i- Shrinkage 

Our final method of estimating the covariance matrix applies 
shrinkage to the jackknife estimator to see if we can achieve en- 
hanced robustness. We use the same method to construct a target 
matrix as described in section [5731 using the diagonal of the jack- 
knife estimated covariance matrix in the non-linear regime. We cal- 
culate the shrinkage intensity. A*, and covariance estimate for each 
of the 4096 covariance matrices as described in section 12.21 but 
substituting equation [30] for equation [5] throughout. We find values 
for A* distributed evenly between 0.0 and 1.0 (see Fig.[7]l. We run 
the same tests as described in section [574] and the results are shown 
in Fig. [5] and Table [2| 

As with the shrinkage version of the Monte Carlo estimator, 
the shrinkage version of the jackknife estimator shows significant 
improvement in the actual estimated parameter, o-g. However, the 
central value and width are not quite as good as for the reference 
case. There is some improvement in the estimation of the error bar, 
though the error bars are still systematically underestimated by a 
factor of roughly two compared to the reference. 

Fig. [5] and Table [2| also show the results of estimating ag and 
A using only the diagonal targets used in the shrinkage version of 
the jackknife estimator. Again, the diagonal target matrix does well 
for estimating ag due to the lack of noise, but it gives the worst 
estimates of the error bars. 
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In this case, the shrinkage version of the jackknife estimator 
did the best job of estimating the error bars, and it was only shghtly 
worse than the diagonal approximation at recovering the distribu- 
tion of f78. Again, shrinkage estimation is doing an excellent job 
of keeping information about covariance while reducing the total 
noise. 



6 DISCUSSIONS AND CONCLUSIONS 

We have introduced shrinkage as a technique for improving esti- 
mates of the covariance matrix for power spectrum measurements. 
We tested our methods on dark matter simulations and showed im- 
provement over the empirically estimated covariance matrix from 
a limited number of simulations or jackknife resamplings. In order 
to clearly assess the potential improvement from using shrinkage 
estimation, we chose an intentionally difficult scenario where tradi- 
tional methods of estimating the covariance were unlikely to yield 
satisfactory results. All of these methods would perform better if 
we allowed ourselves more simulations per Monte Carlo estimate 
or if we did not push as far into the non-linear clustering regime. 
The shrinkage technique would still outperform the other methods, 
but perhaps the differences would be less obvious. 

A good estimate of the covariance matrix of a power spectrum 
measurement is essential for extracting cosmological information 
via parameter fitting. Including the covariance between different 
bins is a good step towards properly estimating the confidence in- 
tervals on cosmological parameters. However, the increased num- 
ber of free parameters of a full covariance estimate (as opposed to 
a diagonal approximation) can cause the covariance estimate to be 
noisy if only a relatively small number of simulations are available. 
This noise can adversely affect the estimate of the parameter it- 
self. A diagonal approximation to the covariance can be more easily 
constrained with a limited number of simulations, leading to better 
estimates of the parameter values. However, the confidence inter- 
vals can be severely underestimated if actual covariance is ignored. 
Neither alternative is appealing. If a similar measurement was per- 
formed with the two-point correlation function, the Fourier dual of 
the power spectrum, a full covariance matrix is especially impor- 
tant as bins will be strongly correlated, even in the linear clustering 
regime. Realistic survey geometries will also cause additional co- 
variance on large scales for the power spectrum. 

Shrinkage estimation is an optimal way of combining a model 
with many degrees of freedom and a model with few degrees of 
freedom to minimize the total error on the covariance estimate. In 
our example the shrinkage versions of the Monte Carlo and jack- 
knife estimators clearly outperformed their counterparts without 
shrinkage, with the shrinkage version of t he Monte Carlo estima tor 
producing the best results. The lemma oflLedoit & Wolj j2003h as 
employed bv Schafer & Strimmej feOOSh allows a mathematically 
and numerically simple way of calculating the optimal shrinkage 
intensity. This means that there is minimal addition work required 
to use a shrinkage version of a covariance estimator Shrinkage es- 
timation can result in a massive improvement in the limit of a small 
number of simulations and will not adversely affect the covariance 
estimate in the limit of a large number of simulations. For these 
reasons we always recommend the use of the shrinkage versions of 
covariance estimators in all regimes. 

We briefly investigated the effects of shrinkage estimation in 
the limit of a large (though not infinite) number of simulations. We 
applied shrinkage estimation to our reference covariance matrix 
estimated from all 4096 sub-volumes using the target from equa- 
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Figure 6. Plot of the (sorted) eigenvalue spectrum for the Reference covari- 
ance matrix, empirically estimated from 4096 sub-volumes, and the shrink- 
age of that reference covariance matrix against our diagonal target (Refer- 
ence+S). The lower panel shows the ratio of these eigenvalue spectra. 



tion[29]and found an optimal shrinkage intensity A* = 0.0096. 
This number is the same order as the relative noise we expect in 
each element of the matrix, 1/ V 4096 = 0.0156. We then calcu- 
lated the eigensystems of both matrices. The dot products of the 
corresponding eigenvectors always exceeded 0.996, indicating that 
they are essentially identical. The (sorted) eigenvalue spectra are 
shown in Fig. |6] The eigenvalues are the same to within 1% for 
the first 10 eigenmodes. After the tenth eigenmode the eigenval- 
ues from the reference matrix become increasingly smaller com- 
pared to the shrinkage version. By the final eigenmode the dif- 
ference is ~ 50%. The shrinkage version of the reference matrix 
should be a more accurate estimate of the true underlying covari- 
ance matrix. The non-linear nature of matrix inversion can cause 
errors ^ 1% even when individual elements of the covariance 
matrix are estimated to ~ 1%. We ran our parameter estimation 
test using the shrinkage version of the reference matrix and found 
that {(Ts) moved by less than 0.5%. This is small compared to the 
width of the distribution, which is ~ 5%. The average minimum 
did improve from 52.4 to 41.1 with the shrinkage version of 
the covariance matrix, though this is still large for 18 — 1 = 17 
degrees of freedom. The remaining discrepancy is dominated by 
bias from problems with modeling the power spectrum into the 
non-linear regime or power loss in the simulation at smaller scales 
due to low resolution, not a grossly inaccurate estimate of the vari- 
ances. The amplitude is mainly sensitive to smooth eigenmodes, 
which have large eigenvalues, so there is little change in the es- 
timated value. Parameters that are more sensitive to the shape of 
the power spectrum may be more sensitive to the lower eigenvalue 
modes and show more than a 1% change. The impact of these dif- 
ferences could be estimated with a study of the information content 
of the power spectru m covariance in terms of co smological param- 
eter confidences (i.e., iNevrinck & SzaDudil2007h , but this is beyond 
the scope of this paper 
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Figure 7. Plot of estimated error bar. A, as a function of the shiinkage 
intensity. A*, for the shrinkage versions of the Monte Carlo (MC+S) and 
jackknife (JK+S) methods. For clarity all of the MC+S error bars are plotted 
as positive and all of the JK+S as negative. 



We employed a very simple diagonal target matrix in this 
paper, but better targets can clearly improve the efficiency of the 
shrinkage technique. A much more realistic model for covariance 
on small scales could be constructed using the halo model. For re- 
alistic measurements it may also be advantageous to model some 
of the effects of survey geometries, redshift-space distortions, and 
clustering bias. Targets that depend on a small number of free pa- 
rameters may be very useful for some of these effects (e.g., cluster- 
ing bias). Targets can also be developed for a wide range of large- 
scale structure measurements in addition to the power spectrum. 
The exploration of more sophisticated targets is beyond the scope 
of this paper and is left to future studies. 

Ultimately we would like to develop more diagnostics of the 
performance of our covariance estimates. Fig. |7] shows the esti- 
mated error bar. A, as a function of the shrinkage intensity, A*, 
for the shrinkage versions of the Monte Carlo and jackknife esti- 
mators. There is clearly some correlation for the shrinkage version 
of the Monte Carlo estimator, so knowledge of A* could help one 
gauge how much the error bars are underestimated. The exploration 
of such diagnostics should proceed as better targets are developed. 

The difficulties in estimating the power spectrum covariance 
matrix in the context of making precision cosmological measure- 
ments are of even greater concern for higher-order clustering mea- 
surements. Higher-order clustering measurements have a configu- 
ration space with more degrees of freedom than the power spec- 
trum (or two-point correlation function). Even a lower resolution 
measurement will have more bins and a much larger covariance 
matrix, and noise will cause larger deviations in the inverse ma- 
trix. Theoretical modeling of the covariance matrix for an N-point 
correlation function gene rally involves correlations up to the 2N- 
point (e.g., ISzapudil200^ . making the models more uncertain. The 
ability to optimally combine simulations and a theoretical model 
with a small number of free parameters will make dramatic im- 
provements. Shrinkage estimators could also be used for covariance 
matrices of measurements outside of large-scale structure, includ- 
ing the cosmic microwave background power spectrum. Finally, 
we note that Section |2!T] makes no specific references to covari- 
ance matrices and that shrinkage is a general estimation technique. 



We are studying additional applications of shrinkage estimation for 
cosmological measurements. 
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